Constraint-based causal discovery from multiple interventions over overlapping variable sets

نویسندگان

  • Sofia Triantafillou
  • Ioannis Tsamardinos
چکیده

Scientific practice typically involves repeatedly studying a system, each time trying to unravel a different perspective. In each study, the scientist may take measurements under different experimental conditions (interventions, manipulations, perturbations) and measure different sets of quantities (variables). The result is a collection of heterogeneous data sets coming from different data distributions. In this work, we present algorithm COmbINE, which accepts a collection of data sets over overlapping variable sets under different experimental conditions; COmbINE then outputs a summary of all causal models indicating the invariant and variant structural characteristics of all models that simultaneously fit all of the input data sets. COmbINE converts estimated dependencies and independencies in the data into path constraints on the data-generating causal model and encodes them as a SAT instance. The algorithm is sound and complete in the sample limit. To account for conflicting constraints arising from statistical errors, we introduce a general method for sorting constraints in order of confidence, computed as a function of their corresponding p-values. In our empirical evaluation, COmbINE outperforms in terms of efficiency the only pre-existing similar algorithm; the latter additionally admits feedback cycles, but does not admit conflicting constraints which hinders the applicability on real data. As a proof-of-concept, COmbINE is employed to co-analyze 4 real, mass-cytometry data sets measuring phosphorylated protein concentrations of overlapping protein sets under 3 different interventions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discussion of "Learning Equivalence Classes of Acyclic Models with Latent and Selection Variables from Multiple Datasets with Overlapping Variables"

In automated causal discovery, the constraint-based approach seeks to learn an (equivalence) class of causal structures (with possibly latent variables and/or selection variables) that are compatible (according to some assumptions, usually the causal Markov and faithfulness assumptions) with the conditional dependence and independence relations found in data. In the paper under discussion, Till...

متن کامل

Joint Causal Inference on Observational and Experimental Datasets

We introduce Joint Causal Inference (JCI), a powerful formulation of causal discovery from multiple datasets that allows to jointly learn both the causal structure and targets of interventions from statistical independences in pooled data. Compared with existing constraint-based approaches for causal discovery from multiple data sets, JCI offers several advantages: it allows for several differe...

متن کامل

A Bayesian Approach to Causal Discovery

We examine the Bayesian approach to the discovery of directed acyclic causal models and compare it to the constraint-based approach. Both approaches rely on the Causal Markov assumption, but the two di er signi cantly in theory and practice. An important di erence between the approaches is that the constraint-based approach uses categorical information about conditional-independence constraints...

متن کامل

Joint Probabilistic Inference of Causal Structure

Causal directed acyclic graphical models (DAGs) are powerful reasoning tools in the study and estimation of cause and effect in scientific and socio-behavioral phenomena. In many domains where the cause and effect structure is unknown, a key challenge in studying causality with DAGs is learning the structure of causal graphs directly from observational data. Traditional approaches to causal str...

متن کامل

Causal Discovery from Nonstationary/Heterogeneous Data: Skeleton Estimation and Orientation Determination

It is commonplace to encounter nonstationary or heterogeneous data, of which the underlying generating process changes over time or across data sets (the data sets may have different experimental conditions or data collection conditions). Such a distribution shift feature presents both challenges and opportunities for causal discovery. In this paper we develop a principled framework for causal ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 16  شماره 

صفحات  -

تاریخ انتشار 2015